Noisy Data Make the Partial Digest Problem NP-hard
نویسندگان
چکیده
The problem to find the coordinates of n points on a line such that the pairwise distances of the points form a given multi-set of n 2 distances is known as Partial Digest problem, which occurs for instance in DNA physical mapping and de novo sequencing of proteins. Although Partial Digest was – as a combinatorial problem – already proposed in the 1930’s, its computational complexity is still unknown. In an effort to model real-life data, we introduce two optimization variations of Partial Digest that model two different error types that occur in real-life data. First, we study the computational complexity of a minimization version of Partial Digest in which only a subset of all pairwise distances is given and the rest are lacking due to experimental errors. We show that this variation is NP-hard to solve exactly. This result answers an open question posed by Pevzner (2000). We then study a maximization version of Partial Digest where a superset of all pairwise distances is given, with some additional distances due to inaccurate measurements. We show that this maximization version is NP-hard to approximate to within a factor of |D| 2−ε for any ε > 0, where |D| is the number of input distances. This inapproximability result is tight up to low-order terms as we give a trivial approximation algorithm that achieves a matching approximation ratio.
منابع مشابه
Noisy Data Make the Partial Digest Problem NP - hardTECHNICAL
The Partial Digest problem { well-known for its applications in computational biology and for the intriguingly open status of its computational complexity { asks for the coordinates of n points on a line such that the pairwise distances of the points form a given multi-set of ? n 2 distances. In an eeort to model real-life data, we study the computational complexity of a minimization version of...
متن کاملMeasurement Errors Make the Partial Digest Problem NP-Hard
The Partial Digest problem asks for the coordinates of m points on a line such that the pairwise distances of the points form a given
متن کاملModeling of Partial Digest Problem as a Network flows problem
Restriction Site Mapping is one of the interesting tasks in Computational Biology. A DNA strand can be thought of as a string on the letters A, T, C, and G. When a particular restriction enzyme is added to a DNA solution, the DNA is cut at particular restriction sites. The goal of the restriction site mapping is to determine the location of every site for a given enzyme. In partial digest metho...
متن کاملPartial Digest is hard to solve for erroneous input data
The Partial Digest problem asks for the coordinates of m points on a line such that the pairwise distances of the points form a given multiset of (m 2 ) distances. Partial Digest is a well-studied problem with important applications in physical mapping of DNA molecules. Its computational complexity status is open. Input data for Partial Digest from real-life experiments are always prone to erro...
متن کاملDouble Digest Revisited: Complexity and Approximability in the Presence of Noisy Data
We revisit the double digest problem, which occurs in sequencing of large DNA strings and consists of reconstructing the relative positions of cut sites from two different enzymes: we rst show that double digest is strongly NP-complete, improving previous results that only showed weak NP-completeness. Even the (experimentally more meaningful) variation in which we disallow coincident cut sites ...
متن کامل